Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETCM-73] Transactional blockchain #673

Merged
merged 1 commit into from
Sep 21, 2020

Conversation

pslaski
Copy link
Contributor

@pslaski pslaski commented Sep 16, 2020

Description

The goal is to make part of the blockchain methods transactional.

Important Changes Introduced

  • introduced DataSourceBatchUpdate
  • blockchain methods which are saving to DB are transactional now
  • removed NodeStorage interface (NodeStorage and CachedNodeStorage are independent now)

Testing

check syncing and pruning

MISC

Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?

@pslaski pslaski force-pushed the etcm-73-transactional-blockchain branch 3 times, most recently from e7a15e1 to ab83593 Compare September 16, 2020 16:19
Copy link

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed NodeStorage interface (NodeStorage and CachedNodeStorage are independent now)

Was this needed? Our MPT node handling is way too overly complicated and I don't see the benefit of that change, should we maybe keep it as is till we dedicate fully to a better design of the whole of it?

Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?

For me yes! But if it's not a blocker for any task we'll probably kick working on it for later

@@ -98,7 +97,7 @@ class ReferenceCountedStateStorage(private val nodeStorage: NodeStorage,
}

override def saveNode(nodeHash: NodeHash, nodeEncoded: NodeEncoded, bn: BigInt): Unit = {
new FastSyncNodeStorage(nodeStorage, bn).update(Nil, Seq(nodeHash -> nodeEncoded))
new FastSyncNodeStorage(cachedNodeStorage, bn).updateInStorage(Nil, Seq(nodeHash -> nodeEncoded))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the nodeStorage replaced here by the cachedNodeStorage?


override def persist(): Unit = {}
override def persist(): Unit = {
cachedNodeStorage.forcePersist()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cachedNodeStorage.persist()?


object CachedNodeStorage {
// special storage where we want to minimize cache layer because datasource is in memory
def withMinimalCache(dataSource: EphemDataSource): CachedNodeStorage =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the 0 below mean that it has no caching instead of minimal?

@pslaski pslaski force-pushed the etcm-73-transactional-blockchain branch from ab83593 to ad927d5 Compare September 17, 2020 07:55
Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments for now.
General remarks:

Our MPT node handling is way too overly complicated and I don't see the benefit of that change, should we maybe keep it as is till we dedicate fully to a better design of the whole of it?

I agree that better desging would be needed but it would need to take all constrains which kinda grow organically i.e diefferent modes of pruning and the fact that handling state nodes is one of the main bottleneck of sync so performance here is critical.

Storing best-known block number mechanism is using AtomicReference + DB now. Is it worth to create a ticket which will simplify that?

What do you have in mind by this simplification ? I mean the main constraint here is that it needs to be quite performant as it is frequently used thing, and we cannont update best block in db in each best block update as we do not now if node cache has been flushed. So the idea here is to udpdate the block in db only when flushing the cache, and otherwise operate on this counter.

@@ -9,38 +9,37 @@ class EphemDataSource(var storage: Map[ByteBuffer, Array[Byte]]) extends DataSou
* key.drop to remove namespace prefix from the key
* @return key values paris from this storage
*/
def getAll(namespace: Namespace): Seq[(IndexedSeq[Byte], IndexedSeq[Byte])] =
storage.toSeq.map{case (key, value) => (key.array().drop(namespace.length).toIndexedSeq, value.toIndexedSeq)}
def getAll(namespace: Namespace): Seq[(IndexedSeq[Byte], IndexedSeq[Byte])] = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those synchronised blocks necessary or you just added them to keep consistent with docs on data source Implementations should guarantee that the whole operation is atomic. ?
I am asking as we are using this Storage when fast syncing in one threaded scenarion, and synchronized blocks will probably add some unnecessary overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same change was introduced in our first template PR ;)

* If a key is already in the DataSource its value will be updated.
* @return the new DataSource after the removals and insertions were done.
*/
def updateOptimized(toRemove: Seq[Array[Byte]], toUpsert: Seq[(Array[Byte], Array[Byte])]): DataSource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to remove this methods ? Main reason for them was the fact that:

  • in case of NodeStorage our key is type NodeHash = ByteString and value type NodeEncoded = Array[Byte]
  • when going thorough generic serializers we convert those values into IndexedSeq[Byte]
  • then when writing to storage we convert them both to Array[Byte]
    and this process is really wasteful and in case a lot of nodes in mpt trie (like in eth) those conversions becomes huge bottleneck for syncing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created new DataSourceUpdateOptimized type for optimized execution

@@ -11,21 +11,34 @@ import io.iohk.ethereum.db.storage.encoding._
* node saved will have its reference count equal to 1.
*
* */
class FastSyncNodeStorage(nodeStorage: NodesStorage, bn: BigInt) extends ReferenceCountNodeStorage(nodeStorage, bn) {
class FastSyncNodeStorage(cachedNodeStorage: CachedNodeStorage, bn: BigInt) extends ReferenceCountNodeStorage(cachedNodeStorage, bn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastSync does not have concept of caching it need to save data into database directly.

this.dataSource eq that.dataSource,
"Transactional storage updates must be performed on the same data source"
)
DataSourceBatchUpdate(dataSource, this.updates ++ that.updates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if queue with appending or list with prepending would not be better here ? As otherwise when batching a lot of updates from list (like in FastSync) we have quite an overhead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pslaski pslaski force-pushed the etcm-73-transactional-blockchain branch 4 times, most recently from 47907b6 to 0982653 Compare September 17, 2020 14:53
Copy link

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you have in mind by this simplification ? I mean the main constraint here is that it needs to be quite performant as it is frequently used thing, and we cannont update best block in db in each best block update as we do not now if node cache has been flushed. So the idea here is to udpdate the block in db only when flushing the cache, and otherwise operate on this counter.

Hmm given that, couldn't we hide the bestKnowBlock (which I would rename to inMemoryBestBlock or cachedBestBlock) in the blockchain interface? That is, instead of saveBestKnownBlock and saveBestBlock have a single saveBestBlock

I see the constraint that this can't be updated before flushing the nodes cache but not the performance constraint. Usually when updating the bestBlock it's because we are removing or inserting whole blocks, aren't those operations much heavier than this?

storage = afterUpdate
}

def updateOptimized(toRemove: Seq[Array[Byte]], toUpsert: Seq[(Array[Byte], Array[Byte])]): Unit = synchronized {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: should this and the update method (def update(namespace: Namespace, toRemove: Seq[Key], toUpsert: Seq[(Key, Value)]): Unit) be private?

@pslaski pslaski force-pushed the etcm-73-transactional-blockchain branch from 0982653 to 5be5f3f Compare September 18, 2020 15:19
Copy link

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

 .\\            //.
. \ \          / /.
.\  ,\     /` /,.-
 -.   \  /'/ /  .
 ` -   `-'  \  -
   '.       /.\`
      -    .-
      :`//.'
      .`.'
      .' BP 

@pslaski pslaski force-pushed the etcm-73-transactional-blockchain branch from 5be5f3f to 01484e1 Compare September 21, 2020 08:02
@pslaski pslaski force-pushed the etcm-73-transactional-blockchain branch from 01484e1 to 3144d38 Compare September 21, 2020 08:06
@pslaski pslaski merged commit 6691f66 into develop Sep 21, 2020
@pslaski pslaski deleted the etcm-73-transactional-blockchain branch September 21, 2020 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants